Effects of COVID-19 pandemic on hospital-acquired infections
BMIN503/EPID600 Final Project
Author
Kevin Mears
Use this template to complete your project throughout the course. Your Final Project presentation will be based on the contents of this document. Replace the title/name above and text below with your own, but keep the headers. Feel free to change the theme and other display settings, although this is not required.
1 Overview
Give a brief a description of your project and its goal(s), what data you are using to complete it, and what two faculty/staff in different fields you have spoken to about your project with a brief summary of what you learned from each person. Include a link to your final project GitHub repository.
The goal of my final project is to determine how the COVID-19 pandemic and the surge in hospital visits, changes in cleaning/hygiene, and people staying at home affected the incidence of hospital-associated infections. I identified the 2020 data set from the CDC National Hospital Care Survey (NHCS) which contains 132,694 inpatients and 388,753 emergency department visits. It contains information including patient age, sex, the month they were discharged, and their diagnoses. I will use this data to determine how the incidence of hospital-associated infections changed month-to-month in 2020 and whether there are correlations with other parameters (e.g. COVID-19 diagnosis).
I spoke with Dr. Joseph Zackular (Assistant Professor of Pathology and Laboratory Medicine at CHOP) who is a bacteriologist and an expert on C. difficile, the most common hospital-acquired infection. He suggested I stratify the dataset based on sex and age as demographics that might affect hospital infections. He suggested I look at bacterial pneumonia as a positive control as it is commonly associated with COVID co-infection.
I also spoke with Dr. Kyle Bittinger (Bioinformatics Laboratory Director, CHOP Microbiome Center) who generously offered to help with the code where needed.
Describe the problem addressed, its significance, and some background to motivate the problem. This should extend what is in the Section 1.
Explain why your problem is interdisciplinary, what fields can contribute to its understanding, and incorporate background related to what you learned from meeting with faculty/staff.
Hospital-acquired infections (HAI) are infections acquired after hospital admission. It includes catheter-associated urinary tract infections, central line-associated blood stream infections, surgical site infections, ventilator-associated and hospital-acquired pneumonia, and Clostridioides difficile infections. The risk of HAIs depend on many factors, including the facility’s disinfection and infection prevention practices, the patient’s immune status, length of stay in the hospital, co-morbitities, ventilator support, and use of invasive procedures. Receipt of antibiotics is one of the major risk factors for developing Clostridioides difficile infection and other multidrug resistant bacterial infections (e.g. Vancomycin resistant Enterococcus or Methicillin resistant Staphylococcus aureus). According to the CDC, about 4% of hospitalized patients experience at least one HAI, with an estimated 648,000 cases in 2011; these are dominated by pneumonia, surgical site infections, gastrointestinal infections, UTIs, and bloodstream infections. Clostidioides difficile infection is the most common cause of hospital acquired infections and can cause severe, potentially life-threatening colitis. Common causes of hospital-associated and ventilator-associated pneumonia are Staphylococcus aureus, Pseudomonas aeruginosa, E. coli, and Klebsiella penumoniae. Common causes of catheter-associated UTIs are Enterococcus, Staphylcoccus aureus, Pseudomonas, Proteus, Klebsiella, and Candida. Common causes of surgical site infections include Staphylococcus aureus, other Staphylococcus, Enterococcus, E. coli, Pseudomonas aeruginosa, Enterobacter, and Klebsiella.
The COVID-19 pandemic led to a surge in hospital visits, overwhelming our healthcare system and led to over a million deaths. Coincidently, disinfection practices intensified in public facilities including hospitals. Many causes of hospital-associated infections, including C. difficile, MRSA, VRE, norovirus are difficult to disinfect using traditional cleaning methods. The present study aims to determine how the COVID-19 pandemic affected the incidence of hospital-associated infections in 2020.
This is an interdisciplinary question because it requires an understanding of microbiology and the various factors that affect pathogenesis, epidemiology to appreciate the how disease is spread in a hospital setting, and data science for bioinformatics analysis of hospital survey data.
Sources:
Monegro et al. Hospital-Acquired Infections. (2023). In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing. https://www.ncbi.nlm.nih.gov/books/NBK441857/
Dewey et al. Increased Use of Disinfectants During the COVID-19 Pandemic and Its Potential Impacts on Health and Safety. (2021) Acs Chem Health Safety.
Dancer, S.J. Controlling Hospital-Acquired Infection: Focus on the Role of the Environment and New Technologies for Decontamination. (2014) Clin Microbiol Rev.
3 Methods
Describe the data used and general methodological approach used to address the problem described in the Section 2. Subsequently, incorporate full R code necessary to retrieve and clean data, and perform analysis. Be sure to include a description of code so that others (including your future self) can understand what you are doing and why.
library("tidyverse")
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library("ggplot2")library("cowplot")
Attaching package: 'cowplot'
The following object is masked from 'package:lubridate':
stamp
Inpatient data set
# Read in NHCS 2020 inpatient Public Use File R Dataset#https://ftp.cdc.gov/pub/Health_Statistics/NCHS/Datasets/NHCS/2020/R/nhcs2020ip_r.rdsurl <-"https://ftp.cdc.gov/pub/Health_Statistics/NCHS/Datasets/NHCS/2020/R/nhcs2020ip_r.rds"nhcs2020ip <-read_rds(url)
# inpatient data set#pull out variables listvar_2020 <- nhcs2020ip$variables#select useful columnsvar_2020_select <-select(var_2020, 1:71)#variable names varnames_2020 <-colnames(var_2020_select)#pull out primary diagnosis (DX1)diag1_2020 <- var_2020_select$DX1#determine the number of primary C. diff infections (ICD-10 diagnosis code: A047)primaryCdiff_2020 <-sum(diag1_2020 =="A047", na.rm =TRUE)primaryCdiff_2020
[1] 245
There were 245 cases of primary C. difficile infection.
# count the number of patients that stayed for only 1 dayvar_2020_LOS_1 <- var_2020_select |>filter(LOS ==1) |>summarize(count =n()) |>pull(count)ip_total_count <- var_2020_select |>group_by(LOS) |>summarize(count =n()) |>summarize(sum(count))prop_ip_routine <- var_2020_LOS_1 / ip_total_countprop_ip_routine
#re-level agecovid_patients <- covid_patients |>mutate(age.simplified =cut(AGE, breaks =c(0, 5, 10, 15, 20, seq(30, 100, by =10)), right =FALSE, labels =c("less than 5", "5-9", "10-14", "15-19", "20-29", "30-39", "40-49", "50-59", "60-69", "70-79", "80-89", "90+")))#count(covid_patients, age.simplified)#count(covid_patients, AGE)#count(covid_patients, SEX)#count(covid_patients, LOS) #length of stay#count(covid_patients, LOS_30DAYS) #greater than 30 days#count(covid_patients, DISCHARGE_MONTH)#count(covid_patients, DISCHARGE_STATUS)
#visualizationggplot(covid_patients, aes(x = SEX)) +geom_bar() +labs(x ="Sex", y ="Count") +ggtitle("COVID-19 cases by sex") +theme_bw()
ggplot(covid_patients, aes(x = age.simplified)) +geom_bar() +labs(x ="Age", y ="Count") +ggtitle("COVID-19 cases by age") +theme_bw()
ggplot(covid_patients, aes(x = LOS)) +geom_bar() +labs(x ="Length of stay (up to 14 days)", y ="Count") +ggtitle("COVID-19 cases by length of stay") +theme_bw()
Warning: Removed 1 row containing non-finite outside the scale range
(`stat_count()`).
#greater than 15 binned in dataggplot(covid_patients, aes(x = LOS_30DAYS)) +geom_bar() +labs(x ="Length of stay greater than 30 days", y ="Count") +ggtitle("COVID-19 cases by length of stay greater than 30 days") +theme_bw()
ggplot(covid_patients, aes(x = DISCHARGE_MONTH)) +geom_bar() +scale_x_continuous(breaks =1:12) +labs(x ="Month", y ="Count") +ggtitle("COVID-19 cases by discharge month") +theme_bw()
#subset bacterial pneumonia patientsbactpneumo_patients <-filter(alldiag_2020, DX %in%c("J13", "J14", "J150", "J151", "J1520", "J15211", "J1529", "J153", "J154", "J155", "J1561", "J1569", "J157", "J158", "J159", "J160"))#J13 Pneumonia due to Streptococcus pneumoniae#J14 Pneumonia due to Hemophilus influenzae#J150 Pneumonia due to Klebsiella pneumoniae#J151 Pneumonia due to Pseudomonas#J1520 Pneumonia due to staphylococcus, unspecified#J15211 Pneumonia due to Methicillin susceptible Staphylococcus aureus#J15212 Pneumonia due to Methicillin resistant Staphylococcus aureus#J1529 Pneumonia due to other staphylococcus#J153 Pneumonia due to streptococcus, group B#J154 Pneumonia due to other streptococci#J155 Pneumonia due to Escherichia coli#J1561 Pneumonia due to Acinetobacter baumannii#J1569 Pneumonia due to other Gram-negative bacteria#J157 Pneumonia due to Mycoplasma pneumoniae#J158 Pneumonia due to other specified bacteria#J159 Unspecified bacterial pneumonia#J160 Chlamydial pneumonia
#re-level agebactpneumo_patients <- bactpneumo_patients |>mutate(age.simplified =cut(AGE, breaks =c(0, 5, 10, 15, 20, seq(30, 100, by =10)), right =FALSE, labels =c("less than 5", "5-9", "10-14", "15-19", "20-29", "30-39", "40-49", "50-59", "60-69", "70-79", "80-89", "90+")))#count(bactpneumo_patients, age.simplified)#count(bactpneumo_patients, SEX)#count(bactpneumo_patients, LOS) #length of stay#count(bactpneumo_patients, LOS_30DAYS) #greater than 30 days#count(bactpneumo_patients, DISCHARGE_MONTH)#count(bactpneumo_patients, DISCHARGE_STATUS)
#visualizationggplot(bactpneumo_patients, aes(x = SEX)) +geom_bar() +labs(x ="Sex", y ="Count") +ggtitle("Bacterial penumonia cases by sex") +theme_bw()
ggplot(bactpneumo_patients, aes(x = age.simplified)) +geom_bar() +labs(x ="Age", y ="Count") +ggtitle("Bacterial penumonia cases by age") +theme_bw()
ggplot(bactpneumo_patients, aes(x = LOS)) +geom_bar() +labs(x ="Length of stay (up to 14 days)", y ="Count") +ggtitle("Bacterial penumonia cases by length of stay") +theme_bw()
#greater than 15 binned in dataggplot(bactpneumo_patients, aes(x = LOS_30DAYS)) +geom_bar() +labs(x ="Length of stay greater than 30 days", y ="Count") +ggtitle("Bacterial penumonia cases by length of stay greater than 30 days") +theme_bw()
ggplot(bactpneumo_patients, aes(x = DISCHARGE_MONTH)) +geom_bar() +scale_x_continuous(breaks =1:12) +labs(x ="Month", y ="Count") +ggtitle("Bacterial penumonia cases by discharge month") +theme_bw()
#plot number of cases by demographics#re-level ageCdiff_patients <- Cdiff_patients |>mutate(age.simplified =cut(AGE, breaks =c(0, 5, 10, 15, 20, seq(30, 100, by =10)), right =FALSE, labels =c("less than 5", "5-9", "10-14", "15-19", "20-29", "30-39", "40-49", "50-59", "60-69", "70-79", "80-89", "90+")))#count(Cdiff_patients, age.simplified)#count(Cdiff_patients, SEX)#count(Cdiff_patients, LOS) #length of stay#count(Cdiff_patients, LOS_30DAYS) #greater than 30 days#count(Cdiff_patients, DISCHARGE_MONTH)#count(Cdiff_patients, DISCHARGE_STATUS)
#visualizationggplot(Cdiff_patients, aes(x = SEX)) +geom_bar() +labs(x ="Sex", y ="Count") +ggtitle("C. difficile infection by sex") +theme_bw()
ggplot(Cdiff_patients, aes(x = age.simplified)) +geom_bar() +labs(x ="Age", y ="Count") +ggtitle("C. difficile infection by age") +theme_bw()
ggplot(Cdiff_patients, aes(x = LOS)) +geom_bar() +labs(x ="Length of stay (up to 14 days)", y ="Count") +ggtitle("C. difficile infection by length of stay") +theme_bw()
#greater than 15 binned in dataggplot(Cdiff_patients, aes(x = LOS_30DAYS)) +geom_bar() +labs(x ="Length of stay greater than 30 days", y ="Count") +ggtitle("C. difficile infection by length of stay greater than 30 days") +theme_bw()
ggplot(Cdiff_patients, aes(x = DISCHARGE_MONTH)) +geom_bar() +scale_x_continuous(breaks =1:12) +labs(x ="Month", y ="Count") +ggtitle("C. difficile infection by discharge month") +theme_bw()
#MRSAmrsa_patients <-filter(alldiag_2020, DX %in%c("A4102", "A4902", "B9562", "J15212"))#A4102 Sepsis due to Methicillin resistant Staphylococcus aureus#A4902 Methicillin resistant Staphylococcus aureus infection, unspecified site#B9562 Methicillin resistant Staphylococcus aureus infection as the cause of diseases classified elsewhere#J15212 Pneumonia due to Methicillin resistant Staphylococcus aureus# no MRSA cases in dataset
#enterococcusentero_patients <-filter(alldiag_2020, DX %in%c("A4181", "B952"))#A4181 Sepsis due to Enterococcus#B952 Enterococcus as the cause of diseases classified elsewhere
#plot number of cases by demographics#re-level ageentero_patients <- entero_patients |>mutate(age.simplified =cut(AGE, breaks =c(0, 5, 10, 15, 20, seq(30, 100, by =10)), right =FALSE, labels =c("less than 5", "5-9", "10-14", "15-19", "20-29", "30-39", "40-49", "50-59", "60-69", "70-79", "80-89", "90+")))#count(entero_patients, age.simplified)#count(entero_patients, SEX)#count(entero_patients, LOS) #length of stay#count(entero_patients, LOS_30DAYS) #greater than 30 days#count(entero_patients, DISCHARGE_MONTH)#count(entero_patients, DISCHARGE_STATUS)
#visualizationggplot(entero_patients, aes(x = SEX)) +geom_bar() +labs(x ="Sex", y ="Count") +ggtitle("Enterococcus infections by sex") +theme_bw()
ggplot(entero_patients, aes(x = age.simplified)) +geom_bar() +labs(x ="Age", y ="Count") +ggtitle("Enterococcus infections by age") +theme_bw()
ggplot(entero_patients, aes(x = LOS)) +geom_bar() +labs(x ="Length of stay (up to 14 days)", y ="Count") +ggtitle("Enterococcus infections by length of stay") +theme_bw()
#greater than 15 binned in dataggplot(entero_patients, aes(x = LOS_30DAYS)) +geom_bar() +labs(x ="Length of stay greater than 30 days", y ="Count") +ggtitle("Enterococcus infections by length of stay greater than 30 days") +theme_bw()
ggplot(entero_patients, aes(x = DISCHARGE_MONTH)) +geom_bar() +scale_x_continuous(breaks =1:12) +labs(x ="Month", y ="Count") +ggtitle("Enterococcus infections by discharge month") +theme_bw()
#infections from catheterinfcatheter_patients <-filter(alldiag_2020, DX %in%c("T80211A", "T80211D ", "T80211S", "T80212A", "T80212D", "T80212S", "T80218A", "T80218D", "T80218S", "T80219A", "T80219D", "T80219S"))#T80211A Bloodstream infection due to central venous catheter, initial encounter#T80211D Bloodstream infection due to central venous catheter, subsequent encounter#T80211S Bloodstream infection due to central venous catheter, sequela#T80212A Local infection due to central venous catheter, initial encounter#T80212D Local infection due to central venous catheter, subsequent encounter#T80212S Local infection due to central venous catheter, sequela#T80218A Other infection due to central venous catheter, initial encounter#T80218D Other infection due to central venous catheter, subsequent encounter#T80218S Other infection due to central venous catheter, sequela#T80219A Unspecified infection due to central venous catheter, initial encounter#T80219D Unspecified infection due to central venous catheter, subsequent encounter#T80219S Unspecified infection due to central venous catheter, sequela# no catheter-associated infections in dataset
# ventilator-associated pneumoniaventpenumo_patients <-filter(alldiag_2020, DX =="J95851")#J95851 Ventilator associated pneumonia# no ventilator-associated pneumonia cases in dataset
# surgical site infectionssurginf_patients <-filter(alldiag_2020, DX %in%c("T8141XA", "T8141XD", "T8141XS", "T8142XA", "T8142XD", "T8142XS", "T8143XA", "T8143XD", "T8143XS", "O8600", "O8601", "O8602", "O8603", "O8604", "08609"))#T8141XA Infection following a procedure, superficial incisional surgical site, initial encounter#T8141XD Infection following a procedure, superficial incisional surgical site, subsequent encounter#T8141XS Infection following a procedure, superficial incisional surgical site, sequela#T8142XA Infection following a procedure, deep incisional surgical site, initial encounter#T8142XD Infection following a procedure, deep incisional surgical site, subsequent encounter#T8142XS Infection following a procedure, deep incisional surgical site, sequela#T8143XA Infection following a procedure, organ and space surgical site, initial encounter#T8143XD Infection following a procedure, organ and space surgical site, subsequent encounter#T8143XS Infection following a procedure, organ and space surgical site, sequela#O8600 Infection of obstetric surgical wound, unspecified#O8601 Infection of obstetric surgical wound, superficial incisional site#O8602 Infection of obstetric surgical wound, deep incisional site#O8603 Infection of obstetric surgical wound, organ and space site#O8604 Sepsis following an obstetrical procedure#O8609 Infection of obstetric surgical wound, other surgical site# no surgical site infection cases in dataset
#plot month data together covid_month <-ggplot(covid_patients, aes(x = DISCHARGE_MONTH)) +geom_bar() +scale_x_continuous(breaks =1:12) +labs(x ="Month", y ="Count") +theme_bw()#adjust margins for cowplotcovid_month <- covid_month +theme(plot.margin =margin(t =20, r =10, b =10, l =10))bactpneumo_month <-ggplot(bactpneumo_patients, aes(x = DISCHARGE_MONTH)) +geom_bar() +scale_x_continuous(breaks =1:12) +labs(x ="Month", y ="Count") +theme_bw()#adjust margins for cowplotbactpneumo_month <- bactpneumo_month +theme(plot.margin =margin(t =20, r =10, b =10, l =10))cdiff_month <-ggplot(Cdiff_patients, aes(x = DISCHARGE_MONTH)) +geom_bar() +scale_x_continuous(breaks =1:12) +labs(x ="Month", y ="Count") +theme_bw()#adjust margins for cowplotcdiff_month <- cdiff_month +theme(plot.margin =margin(t =20, r =10, b =10, l =10))entero_month <-ggplot(entero_patients, aes(x = DISCHARGE_MONTH)) +geom_bar() +scale_x_continuous(breaks =1:12) +labs(x ="Month", y ="Count") +theme_bw()entero_month <- entero_month +theme(plot.margin =margin(t =20, r =10, b =10, l =10))plot_grid(covid_month, bactpneumo_month, cdiff_month, entero_month, labels =c('COVID-19', 'Bacterial pneumonia', 'C. difficile', 'Enterococcus'), label_size =12, label_x =-0.05, label_y =1)
# determine whether discharge month is associated with COVID # filter DISCHARGE_MONTH and DX columnsdx_month <- alldiag_2020 |>select(c(DISCHARGE_MONTH, DX))# Remove NA in DXdx_month_narm <- dx_month[!is.na(dx_month$DX), ]# Remove NA in Discharge Monthdx_month_narm <- dx_month_narm[!is.na(dx_month_narm$DISCHARGE_MONTH), ]# create new column with COVID infection as factorcovid_binary_month <- dx_month_narm |>mutate(COVID =ifelse(DX =="U071", 1, 0))# Fit the logistic modelcovid.month.fit <-glm(COVID ~factor(DISCHARGE_MONTH), data = covid_binary_month, family = binomial)summary(covid.month.fit)
Call:
glm(formula = COVID ~ factor(DISCHARGE_MONTH), family = binomial,
data = covid_binary_month)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -11.835 1.000 -11.835 < 2e-16 ***
factor(DISCHARGE_MONTH)2 -8.731 49.757 -0.175 0.861
factor(DISCHARGE_MONTH)3 1.735 1.095 1.583 0.113
factor(DISCHARGE_MONTH)4 7.502 1.000 7.499 6.43e-14 ***
factor(DISCHARGE_MONTH)5 6.895 1.001 6.891 5.56e-12 ***
factor(DISCHARGE_MONTH)6 6.186 1.001 6.179 6.45e-10 ***
factor(DISCHARGE_MONTH)7 6.152 1.001 6.145 8.02e-10 ***
factor(DISCHARGE_MONTH)8 5.961 1.001 5.953 2.63e-09 ***
factor(DISCHARGE_MONTH)9 5.601 1.002 5.590 2.27e-08 ***
factor(DISCHARGE_MONTH)10 6.136 1.001 6.130 8.81e-10 ***
factor(DISCHARGE_MONTH)11 6.917 1.001 6.913 4.74e-12 ***
factor(DISCHARGE_MONTH)12 7.404 1.000 7.402 1.34e-13 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 83171 on 1498051 degrees of freedom
Residual deviance: 76909 on 1498040 degrees of freedom
AIC: 76933
Number of Fisher Scoring iterations: 19
COVID-19 infection was positively associated with April~December, which makes sense given the first cases began in April in the data set.
# determine whether discharge month is associated with bacterial pneumonia bactpneumo_code <-c("J13", "J14", "J150", "J151", "J1520", "J15211", "J1529", "J153", "J154", "J155", "J1561", "J1569", "J157", "J158", "J159", "J160")# create new column with COVID infection as factorbactpneumo_binary_month <- alldiag_2020 |>mutate(bactpneumo =ifelse(DX %in% bactpneumo_code, 1, 0))# Fit the linear modelbactpneumo.month.fit <-glm(bactpneumo ~factor(DISCHARGE_MONTH), data = bactpneumo_binary_month, family = binomial)summary(bactpneumo.month.fit)
Call:
glm(formula = bactpneumo ~ factor(DISCHARGE_MONTH), family = binomial,
data = bactpneumo_binary_month)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -7.61100 0.07334 -103.774 < 2e-16 ***
factor(DISCHARGE_MONTH)2 -0.29591 0.11460 -2.582 0.009821 **
factor(DISCHARGE_MONTH)3 0.12533 0.10358 1.210 0.226310
factor(DISCHARGE_MONTH)4 0.27074 0.10598 2.555 0.010626 *
factor(DISCHARGE_MONTH)5 0.08101 0.10749 0.754 0.451060
factor(DISCHARGE_MONTH)6 -0.36646 0.12065 -3.037 0.002387 **
factor(DISCHARGE_MONTH)7 -0.51773 0.12323 -4.201 2.65e-05 ***
factor(DISCHARGE_MONTH)8 -0.41547 0.11996 -3.463 0.000534 ***
factor(DISCHARGE_MONTH)9 -0.70243 0.13258 -5.298 1.17e-07 ***
factor(DISCHARGE_MONTH)10 -0.39638 0.11741 -3.376 0.000735 ***
factor(DISCHARGE_MONTH)11 -0.15140 0.11238 -1.347 0.177909
factor(DISCHARGE_MONTH)12 0.05973 0.10518 0.568 0.570118
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 29344 on 3980819 degrees of freedom
Residual deviance: 29215 on 3980808 degrees of freedom
AIC: 29239
Number of Fisher Scoring iterations: 11
Bacterial pneumonia was negatively associated with discharge months Jan, Feb, and Jun~Oct and positively associated with April.
# determine whether discharge month is associated with C. difficile infection # create new column with C. diff infection as factorcdiff_binary_month <- alldiag_2020 |>mutate(cdiff =ifelse(DX =="A047", 1, 0))# Fit the linear modelcdiff.month.fit <-glm(cdiff ~factor(DISCHARGE_MONTH), data = cdiff_binary_month, family = binomial)summary(cdiff.month.fit)
Call:
glm(formula = cdiff ~ factor(DISCHARGE_MONTH), family = binomial,
data = cdiff_binary_month)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -7.161890 0.096711 -74.054 <2e-16 ***
factor(DISCHARGE_MONTH)2 -0.182908 0.146820 -1.246 0.2128
factor(DISCHARGE_MONTH)3 -0.015104 0.141824 -0.106 0.9152
factor(DISCHARGE_MONTH)4 -0.150278 0.158077 -0.951 0.3418
factor(DISCHARGE_MONTH)5 -0.228811 0.153778 -1.488 0.1368
factor(DISCHARGE_MONTH)6 -0.097025 0.145829 -0.665 0.5058
factor(DISCHARGE_MONTH)7 -0.287661 0.150062 -1.917 0.0552 .
factor(DISCHARGE_MONTH)8 -0.315877 0.151853 -2.080 0.0375 *
factor(DISCHARGE_MONTH)9 -0.003822 0.139506 -0.027 0.9781
factor(DISCHARGE_MONTH)10 -0.263541 0.147330 -1.789 0.0736 .
factor(DISCHARGE_MONTH)11 -0.150290 0.145828 -1.031 0.3027
factor(DISCHARGE_MONTH)12 -0.259636 0.148387 -1.750 0.0802 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 16506 on 1498051 degrees of freedom
Residual deviance: 16494 on 1498040 degrees of freedom
(2482768 observations deleted due to missingness)
AIC: 16518
Number of Fisher Scoring iterations: 10
Only January and August was negatively associated with C. difficile infection indicating that there is no seasonality in cases.
# determine whether discharge month is associated with Enterococcus infection entero_code <-c("A4181", "B952")# create new column with Enterococcus infection as factorentero_binary_month <- alldiag_2020 |>mutate(entero =ifelse(DX %in% entero_code, 1, 0))# Fit the linear modelentero.month.fit <-glm(entero ~factor(DISCHARGE_MONTH), data = entero_binary_month, family = binomial)summary(entero.month.fit)
Call:
glm(formula = entero ~ factor(DISCHARGE_MONTH), family = binomial,
data = entero_binary_month)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -8.88586 0.13868 -64.072 <2e-16 ***
factor(DISCHARGE_MONTH)2 -0.16754 0.20887 -0.802 0.4225
factor(DISCHARGE_MONTH)3 -0.04716 0.20485 -0.230 0.8179
factor(DISCHARGE_MONTH)4 -0.07022 0.22057 -0.318 0.7502
factor(DISCHARGE_MONTH)5 0.19973 0.19709 1.013 0.3109
factor(DISCHARGE_MONTH)6 -0.11961 0.21184 -0.565 0.5723
factor(DISCHARGE_MONTH)7 0.15738 0.19260 0.817 0.4139
factor(DISCHARGE_MONTH)8 -0.41638 0.22692 -1.835 0.0665 .
factor(DISCHARGE_MONTH)9 0.01575 0.20128 0.078 0.9376
factor(DISCHARGE_MONTH)10 -0.02964 0.20017 -0.148 0.8823
factor(DISCHARGE_MONTH)11 0.02457 0.20242 0.121 0.9034
factor(DISCHARGE_MONTH)12 0.13403 0.19520 0.687 0.4923
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 10733 on 3980819 degrees of freedom
Residual deviance: 10721 on 3980808 degrees of freedom
AIC: 10745
Number of Fisher Scoring iterations: 12
Similar to C. difficle, Enterococcus infection was only negatively associated with January and August indicating no seasonality.
#plot age data together covid_age <-ggplot(covid_patients, aes(x = age.simplified)) +geom_bar() +labs(x ="Age", y ="Count") +theme_bw() +theme(axis.text.x =element_text(angle =45, hjust =1))#adjust margins for cowplotcovid_age <- covid_age +theme(plot.margin =margin(t =20, r =10, b =10, l =10))bactpneumo_age <-ggplot(bactpneumo_patients, aes(x = age.simplified)) +geom_bar() +labs(x ="Age", y ="Count") +theme_bw() +theme(axis.text.x =element_text(angle =45, hjust =1))#adjust margins for cowplotbactpneumo_age <- bactpneumo_age +theme(plot.margin =margin(t =20, r =10, b =10, l =10))cdiff_age <-ggplot(Cdiff_patients, aes(x = age.simplified)) +geom_bar() +labs(x ="Age", y ="Count") +theme_bw() +theme(axis.text.x =element_text(angle =45, hjust =1))#adjust margins for cowplotcdiff_age <- cdiff_age +theme(plot.margin =margin(t =20, r =10, b =10, l =10))entero_age <-ggplot(entero_patients, aes(x = age.simplified)) +geom_bar() +labs(x ="Age", y ="Count") +theme_bw() +theme(axis.text.x =element_text(angle =45, hjust =1))entero_age <- entero_age +theme(plot.margin =margin(t =20, r =10, b =10, l =10))plot_grid(covid_age, bactpneumo_age, cdiff_age, entero_age, labels =c('COVID-19', 'Bacterial pneumonia', 'C. difficile', 'Enterococcus'), label_size =12, label_x =-0.05, label_y =1)
# proportions by discharge month and ageggplot(covid_patients, aes(x = DISCHARGE_MONTH, fill = age.simplified)) +geom_bar(position ="fill") +scale_fill_brewer(palette ="Spectral") +scale_x_continuous(breaks =1:12) +labs(x ="Discharge Month", y ="Proportion", fill ="Age") +ggtitle("COVID-19 cases") +theme_bw()
ggplot(bactpneumo_patients, aes(x = DISCHARGE_MONTH, fill = age.simplified)) +geom_bar(position ="fill") +scale_fill_brewer(palette ="Spectral") +scale_x_continuous(breaks =1:12) +labs(x ="Discharge Month", y ="Proportion", fill ="Age") +ggtitle("Bacterial pneumonia cases") +theme_bw()
ggplot(Cdiff_patients, aes(x = DISCHARGE_MONTH, fill = age.simplified)) +geom_bar(position ="fill") +scale_fill_brewer(palette ="Spectral") +scale_x_continuous(breaks =1:12) +labs(x ="Discharge Month", y ="Proportion", fill ="Age") +ggtitle("C. difficile infections") +theme_bw()
ggplot(entero_patients, aes(x = DISCHARGE_MONTH, fill = age.simplified)) +geom_bar(position ="fill") +scale_fill_brewer(palette ="Spectral") +scale_x_continuous(breaks =1:12) +labs(x ="Discharge Month", y ="Proportion", fill ="Age") +ggtitle("Enterococcus infections") +theme_bw()
Bacterial pneumonia follows a similar seasonality as COVID-19 as expected, but C. difficile and Enterococcus infections appears to be pretty consistent throughout the year.
All 4 infections predominantly affects older individuals (highest 60-69).
There were no cases of infections associated with surgeries, infections from catheter use, methicillin-resistant Staphylococcus aureus infections, or ventilator-associated pneumonia in the dataset.
# Add descriptions to diagnosis# Read the lines from the filelines <-readLines("./2021-code-descriptions-tabular-order/icd10cm_codes_2021.txt")# Split each line by 2 or more spacessplit_lines <-strsplit(lines, "\\s{2,}")# Find the maximum number of columnsmax_length <-max(sapply(split_lines, length))# Pad shorter rows with NApadded_lines <-lapply(split_lines, function(x) {length(x) <- max_length # Extend the lengthreturn(x) # Returns the padded line})# Convert the list to a data frameicd_code <-do.call(rbind, lapply(padded_lines, function(x) as.data.frame(t(x), stringsAsFactors =FALSE)))# Optionally, set column names if neededcolnames(icd_code) <-c("DX", "Description")alldiag_desc <-left_join(alldiag_2020, icd_code)
Joining with `by = join_by(DX)`
# diagnosis arranged by number of casesalldiag_desc |>group_by(Description) |>summarize(count =n()) |>arrange(desc(count))
# A tibble: 3,814 × 2
Description count
<chr> <int>
1 <NA> 3217815
2 Essential (primary) hypertension 30922
3 Hyperlipidemia, unspecified 27947
4 Acute kidney failure, unspecified 18190
5 Gastro-esophageal reflux disease without esophagitis 17444
6 Single live birth 12200
7 Encounter for immunization 11723
8 Major depressive disorder, single episode, unspecified 11503
9 Anxiety disorder, unspecified 11406
10 Hypo-osmolality and hyponatremia 10649
# ℹ 3,804 more rows
# determine whether primary (DX1) COVID diagnosis is associated with other hospital-associated infectionsalldiag_covid_binary <- var_2020 |>mutate(COVID =ifelse(rowSums(select(var_2020, 21:50) =="U071", na.rm =TRUE) >0, 1, 0))
# Fit the linear modelcovid.bactpneumo.fit <-glm(bactpneumo ~ COVID, data = alldiag_bactpneumo_binary, family = binomial)summary(covid.bactpneumo.fit)
Call:
glm(formula = bactpneumo ~ COVID, family = binomial, data = alldiag_bactpneumo_binary)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -7.2344 0.1049 -68.99 < 2e-16 ***
COVID 1.6850 0.2262 7.45 9.33e-14 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1865.7 on 132693 degrees of freedom
Residual deviance: 1826.3 on 132692 degrees of freedom
AIC: 1830.3
Number of Fisher Scoring iterations: 10
A positive coefficient (1.69) and a p-value less than 0.001 indicates that there is a very strong positive association between bacterial pneumonia and COVID diagnosis.
# determine whether COVID is associated with C. difficile colitisalldiag_cdiff_binary <- alldiag_covid_binary |>mutate(Cdiff =ifelse(rowSums(select(alldiag_covid_binary, 21:50) =="A047", na.rm =TRUE) >0, 1, 0))alldiag_cdiff_factor <- alldiag_cdiff_binary %>%mutate(COVID =factor(COVID, levels =c(0, 1), labels =c("Negative", "Positive")),Cdiff =factor(Cdiff, levels =c(0,1), labels =c("Negative", "Positive")))# Create the plot with labeled factorsggplot(alldiag_cdiff_factor, aes(x = Cdiff, fill = COVID)) +geom_bar(position ="fill") +scale_fill_manual(values =c("Negative"="#aaaaaa", "Positive"="#b20000")) +theme_bw() +labs(x ="C. difficile infection", y ="Proportion", fill ="COVID Status")
# Fit the linear modelcovid.cdiff.fit <-glm(Cdiff ~ COVID, data = alldiag_cdiff_binary, family = binomial)summary(covid.cdiff.fit)
Call:
glm(formula = Cdiff ~ COVID, family = binomial, data = alldiag_cdiff_binary)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.89046 0.03270 -149.548 <2e-16 ***
COVID 0.03812 0.14568 0.262 0.794
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 11690 on 132693 degrees of freedom
Residual deviance: 11690 on 132692 degrees of freedom
AIC: 11694
Number of Fisher Scoring iterations: 7
There is no association between C. difficile colitis and COVID-19 diagnosis
# determine whether COVID diagnosis is associated with Enterococcus infectionalldiag_entero_binary <- alldiag_covid_binary |>mutate(Entero =ifelse(rowSums(select(alldiag_covid_binary, 21:50) == entero_code, na.rm =TRUE) >0, 1, 0))alldiag_entero_factor <- alldiag_entero_binary %>%mutate(COVID =factor(COVID, levels =c(0, 1), labels =c("Negative", "Positive")),Entero =factor(Entero, levels =c(0,1), labels =c("Negative", "Positive")))# Create the plot with labeled factorsggplot(alldiag_entero_factor, aes(x = Entero, fill = COVID)) +geom_bar(position ="fill") +scale_fill_manual(values =c("Negative"="#aaaaaa", "Positive"="#b20000")) +theme_bw() +labs(x ="Enterococcus infection", y ="Proportion", fill ="COVID Status")
# Fit the linear modelcovid.entero.fit <-glm(Entero ~ COVID, data = alldiag_entero_binary, family = binomial)summary(covid.entero.fit)
Call:
glm(formula = Entero ~ COVID, family = binomial, data = alldiag_entero_binary)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -6.21056 0.06293 -98.686 <2e-16 ***
COVID 0.21349 0.25810 0.827 0.408
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 3873.6 on 132693 degrees of freedom
Residual deviance: 3873.0 on 132692 degrees of freedom
AIC: 3877
Number of Fisher Scoring iterations: 9
There is no association between Enterococus infection and COVID-19 diagnosis
# Determine whether Age is associated with COVID-19# filter DISCHARGE_MONTH and DX columnsdx_age <- alldiag_desc |>select(c(AGE, DX))# Remove NA in DXdx_age_narm <- dx_age[!is.na(dx_age$DX), ]# Remove NA in Discharge Monthdx_age_narm <- dx_age_narm[!is.na(dx_age_narm$AGE), ]# create new column with COVID infection as factorcovid_binary_age <- dx_age_narm |>mutate(COVID =ifelse(DX =="U071", 1, 0))# Fit the logistic modelcovid.age.fit <-glm(COVID ~ AGE, data = covid_binary_age, family = binomial)summary(covid.age.fit)
Call:
glm(formula = COVID ~ AGE, family = binomial, data = covid_binary_age)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -5.9510096 0.0380230 -156.51 <2e-16 ***
AGE 0.0086774 0.0005929 14.64 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 83128 on 1496830 degrees of freedom
Residual deviance: 82901 on 1496829 degrees of freedom
AIC: 82905
Number of Fisher Scoring iterations: 8
Age is positively associated with COVID-19 diagnosis.
Emergency department data set
# Read in NHCS 2020 emergency department R Dataseturl2 <-"https://ftp.cdc.gov/pub/Health_Statistics/NCHS/Datasets/NHCS/2020/R/nhcs2020ed_r.rds"nhcs2020ed <-read_rds(url2)
# emergency department dataset#pull out variables listvar_2020_ed <- nhcs2020ed$variables#select useful columnsvar_2020_ed_select <-select(var_2020_ed, 1:37)#variable names varnames_2020_ed <-colnames(var_2020_ed_select)#pull out primary diagnosis (DX1)diag1_2020_ed <- var_2020_ed_select$DX1#determine the number of primary C. diff infections (ICD-10 diagnosis code: A047)primaryCdiff_2020_ed <-sum(diag1_2020_ed =="A047", na.rm =TRUE)primaryCdiff_2020_ed
[1] 151
There were 151 cases of primary C.diff infection into the emergency department
# list primary diagnosis by countvar_2020_ed_select |>group_by(DX1) |>summarize(count =n()) |>arrange(desc(count))
# the code doesn't match exactly, for e.g. R078 has R0781, R0782, R0789 so can't accurately add descriptions. Left as is. The highest counts are chest pain, COVID-19, respiratory infection, joint pain, abdominal pain.
# A tibble: 9 × 2
DISCHARGE_STATUS count
<fct> <int>
1 Routine to home 314022
2 <NA> 27915
3 Other 22527
4 Left against medical advice 7554
5 Home health care 6093
6 Transfer to short term facility 5279
7 Transfer to long term facility 2115
8 Dead 2063
9 Hospice care - home or medical facility 1185
#visualizationggplot(covid_patients_ed, aes(x = SEX)) +geom_bar() +labs(x ="Sex", y ="Count") +ggtitle("COVID-19 cases by sex") +theme_bw()
ggplot(covid_patients_ed, aes(x = age.simplified)) +geom_bar() +labs(x ="Age", y ="Count") +ggtitle("COVID-19 cases by age") +theme_bw()
ggplot(covid_patients_ed, aes(x = DISCHARGE_MONTH)) +geom_bar() +scale_x_continuous(breaks =1:12) +labs(x ="Month", y ="Count") +ggtitle("COVID-19 cases by discharge month") +theme_bw()
#subset bacterial pneumonia patientsbactpneumo_patients_ed <-filter(alldiag_2020_ed, DX %in%c("J13", "J14", "J150", "J151", "J1520", "J15211", "J1529", "J153", "J154", "J155", "J1561", "J1569", "J157", "J158", "J159", "J160"))#J13 Pneumonia due to Streptococcus pneumoniae#J14 Pneumonia due to Hemophilus influenzae#J150 Pneumonia due to Klebsiella pneumoniae#J151 Pneumonia due to Pseudomonas#J1520 Pneumonia due to staphylococcus, unspecified#J15211 Pneumonia due to Methicillin susceptible Staphylococcus aureus#J15212 Pneumonia due to Methicillin resistant Staphylococcus aureus#J1529 Pneumonia due to other staphylococcus#J153 Pneumonia due to streptococcus, group B#J154 Pneumonia due to other streptococci#J155 Pneumonia due to Escherichia coli#J1561 Pneumonia due to Acinetobacter baumannii#J1569 Pneumonia due to other Gram-negative bacteria#J157 Pneumonia due to Mycoplasma pneumoniae#J158 Pneumonia due to other specified bacteria#J159 Unspecified bacterial pneumonia#J160 Chlamydial pneumonia
#plot number of cases by demographics#re-level ageCdiff_patients_ed <- Cdiff_patients_ed |>mutate(age.simplified =cut(AGE, breaks =c(0, 5, 10, 15, 20, seq(30, 100, by =10)), right =FALSE, labels =c("less than 5", "5-9", "10-14", "15-19", "20-29", "30-39", "40-49", "50-59", "60-69", "70-79", "80-89", "90+")))#count(Cdiff_patients_ed, age.simplified)#count(Cdiff_patients_ed, SEX)#count(Cdiff_patients_ed, DISCHARGE_MONTH)#count(Cdiff_patients_ed, DISCHARGE_STATUS)
#visualizationggplot(Cdiff_patients_ed, aes(x = SEX)) +geom_bar() +labs(x ="Sex", y ="Count") +ggtitle("C. difficile infection by sex") +theme_bw()
ggplot(Cdiff_patients_ed, aes(x = age.simplified)) +geom_bar() +labs(x ="Age", y ="Count") +ggtitle("C. difficile infection by age") +theme_bw()
ggplot(Cdiff_patients_ed, aes(x = DISCHARGE_MONTH)) +geom_bar() +scale_x_continuous(breaks =1:12) +labs(x ="Month", y ="Count") +ggtitle("C. difficile infection by discharge month") +theme_bw()
#MRSAmrsa_patients_ed <-filter(alldiag_2020_ed, DX %in%c("A4102", "A4902", "B9562", "J15212"))#A4102 Sepsis due to Methicillin resistant Staphylococcus aureus#A4902 Methicillin resistant Staphylococcus aureus infection, unspecified site#B9562 Methicillin resistant Staphylococcus aureus infection as the cause of diseases classified elsewhere#J15212 Pneumonia due to Methicillin resistant Staphylococcus aureus# no MRSA cases in dataset
#enterococcusentero_patients_ed <-filter(alldiag_2020_ed, DX %in%c("A4181", "B952"))#A4181 Sepsis due to Enterococcus#B952 Enterococcus as the cause of diseases classified elsewhere
#plot number of cases by demographics#re-level ageentero_patients_ed <- entero_patients_ed |>mutate(age.simplified =cut(AGE, breaks =c(0, 5, 10, 15, 20, seq(30, 100, by =10)), right =FALSE, labels =c("less than 5", "5-9", "10-14", "15-19", "20-29", "30-39", "40-49", "50-59", "60-69", "70-79", "80-89", "90+")))#count(entero_patients_ed, age.simplified)#count(entero_patients_ed, SEX)#count(entero_patients_ed, DISCHARGE_MONTH)#count(entero_patients_ed, DISCHARGE_STATUS)
#visualizationggplot(entero_patients_ed, aes(x = SEX)) +geom_bar() +labs(x ="Sex", y ="Count") +ggtitle("Enterococcus infections by sex") +theme_bw()
ggplot(entero_patients_ed, aes(x = age.simplified)) +geom_bar() +labs(x ="Age", y ="Count") +ggtitle("Enterococcus infections by age") +theme_bw()
ggplot(entero_patients_ed, aes(x = DISCHARGE_MONTH)) +geom_bar() +scale_x_continuous(breaks =1:12) +labs(x ="Month", y ="Count") +ggtitle("Enterococcus infections by discharge month") +theme_bw()
#infections from catheterinfcatheter_patients_ed <-filter(alldiag_2020_ed, DX %in%c("T80211A", "T80211D ", "T80211S", "T80212A", "T80212D", "T80212S", "T80218A", "T80218D", "T80218S", "T80219A", "T80219D", "T80219S"))#T80211A Bloodstream infection due to central venous catheter, initial encounter#T80211D Bloodstream infection due to central venous catheter, subsequent encounter#T80211S Bloodstream infection due to central venous catheter, sequela#T80212A Local infection due to central venous catheter, initial encounter#T80212D Local infection due to central venous catheter, subsequent encounter#T80212S Local infection due to central venous catheter, sequela#T80218A Other infection due to central venous catheter, initial encounter#T80218D Other infection due to central venous catheter, subsequent encounter#T80218S Other infection due to central venous catheter, sequela#T80219A Unspecified infection due to central venous catheter, initial encounter#T80219D Unspecified infection due to central venous catheter, subsequent encounter#T80219S Unspecified infection due to central venous catheter, sequela# no catheter-associated infections in dataset
# surgical site infectionssurginf_patients_ed <-filter(alldiag_2020_ed, DX %in%c("T8141XA", "T8141XD", "T8141XS", "T8142XA", "T8142XD", "T8142XS", "T8143XA", "T8143XD", "T8143XS", "O8600", "O8601", "O8602", "O8603", "O8604", "08609"))#T8141XA Infection following a procedure, superficial incisional surgical site, initial encounter#T8141XD Infection following a procedure, superficial incisional surgical site, subsequent encounter#T8141XS Infection following a procedure, superficial incisional surgical site, sequela#T8142XA Infection following a procedure, deep incisional surgical site, initial encounter#T8142XD Infection following a procedure, deep incisional surgical site, subsequent encounter#T8142XS Infection following a procedure, deep incisional surgical site, sequela#T8143XA Infection following a procedure, organ and space surgical site, initial encounter#T8143XD Infection following a procedure, organ and space surgical site, subsequent encounter#T8143XS Infection following a procedure, organ and space surgical site, sequela#O8600 Infection of obstetric surgical wound, unspecified#O8601 Infection of obstetric surgical wound, superficial incisional site#O8602 Infection of obstetric surgical wound, deep incisional site#O8603 Infection of obstetric surgical wound, organ and space site#O8604 Sepsis following an obstetrical procedure#O8609 Infection of obstetric surgical wound, other surgical site# no cases in dataset
#plot month data together covid_month_ed <-ggplot(covid_patients_ed, aes(x = DISCHARGE_MONTH)) +geom_bar() +scale_x_continuous(breaks =1:12) +labs(x ="Month", y ="Count") +theme_bw()#adjust margins for cowplotcovid_month_ed <- covid_month_ed +theme(plot.margin =margin(t =20, r =10, b =10, l =10))bactpneumo_month_ed <-ggplot(bactpneumo_patients_ed, aes(x = DISCHARGE_MONTH)) +geom_bar() +scale_x_continuous(breaks =1:12) +labs(x ="Month", y ="Count") +theme_bw()#adjust margins for cowplotbactpneumo_month_ed <- bactpneumo_month_ed +theme(plot.margin =margin(t =20, r =10, b =10, l =10))cdiff_month_ed <-ggplot(Cdiff_patients_ed, aes(x = DISCHARGE_MONTH)) +geom_bar() +scale_x_continuous(breaks =1:12) +labs(x ="Month", y ="Count") +theme_bw()#adjust margins for cowplotcdiff_month_ed <- cdiff_month_ed +theme(plot.margin =margin(t =20, r =10, b =10, l =10))entero_month_ed <-ggplot(entero_patients_ed, aes(x = DISCHARGE_MONTH)) +geom_bar() +scale_x_continuous(breaks =1:12) +labs(x ="Month", y ="Count") +theme_bw()entero_month_ed <- entero_month_ed +theme(plot.margin =margin(t =20, r =10, b =10, l =10))plot_grid(covid_month_ed, bactpneumo_month_ed, cdiff_month_ed, entero_month_ed, labels =c('COVID-19', 'Bacterial pneumonia', 'C. difficile', 'Enterococcus'), label_size =12, label_x =-0.05, label_y =1)
# similar trend between inpatient and emergency room (dipped in the summer, surged in the winter)plot_grid(bactpneumo_month, bactpneumo_month_ed, labels =c('Bact pneumonia (inpatient)', 'Bact pneumonia (ER)'), label_size =12, label_x =-0.05, label_y =1)
# similar trend to covid; no difference between inpatient and ERplot_grid(cdiff_month, cdiff_month_ed, labels =c('C.diff (inpatient)', 'C.diff (emergency room)'), label_size =12, label_x =-0.05, label_y =1)
# more seasonality in the emergency room data for C. diffplot_grid(entero_month, entero_month_ed, labels =c('Enterococcus (inpatient)', 'Enterococcus (ER)'), label_size =12, label_x =-0.05, label_y =1)
#ER data seamingly random (surge in June); doesn't seem to coincide with COVID case
Overall, the inpatient data and emergency room data follows a similar trend by month. There may be more seasonality in the C. difficile emergency room data however, with a drop in cases between April and August.
4 Results
Describe your results and include relevant tables, plots, and code/comments used to obtain them. You may refer to the Section 3 as needed. End with a brief conclusion of your findings related to the question you set out to address. You can include references if you’d like, but this is not required.
I chose to analyze the inpatient data only, because >80% of emergency room visitors were sent home while only 16% of inpatient cases were sent home the same day. Length of stay in the hospital is a major factor in acquiring hospital-associated infection. Overall the month-to-month trends for COVID-19 and hospital infections between inpatient and emergency room visits appeared similar.
prop_ip_routine
sum(count)
1 0.1610849
# 16% of inpatient visits stayed only 1 day
prop_ed_routine
sum(count)
1 0.8077674
# 81% routine to home from emergency deparment
All 4 infections (COVID-19, bacterial pneumonia, C. difficile, Enterococcus) predominantly affected older populations. For all 4, the highest number of cases was between the ages of 60-69.
As expected, there was a significant positive correlation between discharge month (which I used as a proxy for diagnosis month since that wasn’t available) and COVID-19 diagnosis. COVID-19 was positively associated with April~December, coinciding with the emergence of COVID-19 infections. It also appears that there is a seasonality to COVID-19 cases, where cases dropped in the summer months and surged in the winter.
ggplot(covid_patients, aes(x = DISCHARGE_MONTH)) +geom_bar() +scale_x_continuous(breaks =1:12) +labs(x ="Month", y ="Count") +ggtitle("COVID-19 cases by discharge month") +theme_bw()
summary(covid.month.fit)
Call:
glm(formula = COVID ~ factor(DISCHARGE_MONTH), family = binomial,
data = covid_binary_month)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -11.835 1.000 -11.835 < 2e-16 ***
factor(DISCHARGE_MONTH)2 -8.731 49.757 -0.175 0.861
factor(DISCHARGE_MONTH)3 1.735 1.095 1.583 0.113
factor(DISCHARGE_MONTH)4 7.502 1.000 7.499 6.43e-14 ***
factor(DISCHARGE_MONTH)5 6.895 1.001 6.891 5.56e-12 ***
factor(DISCHARGE_MONTH)6 6.186 1.001 6.179 6.45e-10 ***
factor(DISCHARGE_MONTH)7 6.152 1.001 6.145 8.02e-10 ***
factor(DISCHARGE_MONTH)8 5.961 1.001 5.953 2.63e-09 ***
factor(DISCHARGE_MONTH)9 5.601 1.002 5.590 2.27e-08 ***
factor(DISCHARGE_MONTH)10 6.136 1.001 6.130 8.81e-10 ***
factor(DISCHARGE_MONTH)11 6.917 1.001 6.913 4.74e-12 ***
factor(DISCHARGE_MONTH)12 7.404 1.000 7.402 1.34e-13 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 83171 on 1498051 degrees of freedom
Residual deviance: 76909 on 1498040 degrees of freedom
AIC: 76933
Number of Fisher Scoring iterations: 19
Bacterial pneumonia had a similar seasonality as COVID-19, dropping from June to October and surging in the colder months. Analysis of the association of COVID-19 infection and bacterial pneumonia diagnosis reveals a significant positive association, indicating that COVID-19 infection increases the risk of bacterial pneumonia. Interestingly, the incidence of bacterial pneumonia by age changed throughout 2020, where children under the age of 10 that made up nearly 20% of cases in January significantly decreased, likely due to changes in exposure from the COVID-19 pandemic.
ggplot(bactpneumo_patients, aes(x = DISCHARGE_MONTH)) +geom_bar() +scale_x_continuous(breaks =1:12) +labs(x ="Month", y ="Count") +ggtitle("Bacterial penumonia cases by discharge month") +theme_bw()
summary(bactpneumo.month.fit)
Call:
glm(formula = bactpneumo ~ factor(DISCHARGE_MONTH), family = binomial,
data = bactpneumo_binary_month)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -7.61100 0.07334 -103.774 < 2e-16 ***
factor(DISCHARGE_MONTH)2 -0.29591 0.11460 -2.582 0.009821 **
factor(DISCHARGE_MONTH)3 0.12533 0.10358 1.210 0.226310
factor(DISCHARGE_MONTH)4 0.27074 0.10598 2.555 0.010626 *
factor(DISCHARGE_MONTH)5 0.08101 0.10749 0.754 0.451060
factor(DISCHARGE_MONTH)6 -0.36646 0.12065 -3.037 0.002387 **
factor(DISCHARGE_MONTH)7 -0.51773 0.12323 -4.201 2.65e-05 ***
factor(DISCHARGE_MONTH)8 -0.41547 0.11996 -3.463 0.000534 ***
factor(DISCHARGE_MONTH)9 -0.70243 0.13258 -5.298 1.17e-07 ***
factor(DISCHARGE_MONTH)10 -0.39638 0.11741 -3.376 0.000735 ***
factor(DISCHARGE_MONTH)11 -0.15140 0.11238 -1.347 0.177909
factor(DISCHARGE_MONTH)12 0.05973 0.10518 0.568 0.570118
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 29344 on 3980819 degrees of freedom
Residual deviance: 29215 on 3980808 degrees of freedom
AIC: 29239
Number of Fisher Scoring iterations: 11
# Create the plot with labeled factorsggplot(alldiag_bactpneumo_factor, aes(x = bactpneumo, fill = COVID)) +geom_bar(position ="fill") +scale_fill_manual(values =c("Negative"="#aaaaaa", "Positive"="#b20000")) +theme_bw() +labs(x ="Bacterial Pneumonia", y ="Proportion", fill ="COVID Status")
summary(covid.bactpneumo.fit)
Call:
glm(formula = bactpneumo ~ COVID, family = binomial, data = alldiag_bactpneumo_binary)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -7.2344 0.1049 -68.99 < 2e-16 ***
COVID 1.6850 0.2262 7.45 9.33e-14 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1865.7 on 132693 degrees of freedom
Residual deviance: 1826.3 on 132692 degrees of freedom
AIC: 1830.3
Number of Fisher Scoring iterations: 10
ggplot(bactpneumo_patients, aes(x = DISCHARGE_MONTH, fill = age.simplified)) +geom_bar(position ="fill") +scale_fill_brewer(palette ="Spectral") +scale_x_continuous(breaks =1:12) +labs(x ="Discharge Month", y ="Proportion", fill ="Age") +ggtitle("Bacterial pneumonia cases") +theme_bw()
I next performed a similar analysis for two bacterial pathogens commonly associated with hospital-associated infections outside the lungs. There was no clear seasonality with C. difficile infection; cases remained relatively stable throughout the year.
ggplot(Cdiff_patients, aes(x = DISCHARGE_MONTH)) +geom_bar() +scale_x_continuous(breaks =1:12) +labs(x ="Month", y ="Count") +ggtitle("C. difficile infection by discharge month") +theme_bw()
summary(cdiff.month.fit)
Call:
glm(formula = cdiff ~ factor(DISCHARGE_MONTH), family = binomial,
data = cdiff_binary_month)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -7.161890 0.096711 -74.054 <2e-16 ***
factor(DISCHARGE_MONTH)2 -0.182908 0.146820 -1.246 0.2128
factor(DISCHARGE_MONTH)3 -0.015104 0.141824 -0.106 0.9152
factor(DISCHARGE_MONTH)4 -0.150278 0.158077 -0.951 0.3418
factor(DISCHARGE_MONTH)5 -0.228811 0.153778 -1.488 0.1368
factor(DISCHARGE_MONTH)6 -0.097025 0.145829 -0.665 0.5058
factor(DISCHARGE_MONTH)7 -0.287661 0.150062 -1.917 0.0552 .
factor(DISCHARGE_MONTH)8 -0.315877 0.151853 -2.080 0.0375 *
factor(DISCHARGE_MONTH)9 -0.003822 0.139506 -0.027 0.9781
factor(DISCHARGE_MONTH)10 -0.263541 0.147330 -1.789 0.0736 .
factor(DISCHARGE_MONTH)11 -0.150290 0.145828 -1.031 0.3027
factor(DISCHARGE_MONTH)12 -0.259636 0.148387 -1.750 0.0802 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 16506 on 1498051 degrees of freedom
Residual deviance: 16494 on 1498040 degrees of freedom
(2482768 observations deleted due to missingness)
AIC: 16518
Number of Fisher Scoring iterations: 10
Similarly, the incidence of Enterococcus infection did not have a clear seasonality.
ggplot(entero_patients, aes(x = DISCHARGE_MONTH)) +geom_bar() +scale_x_continuous(breaks =1:12) +labs(x ="Month", y ="Count") +ggtitle("Enterococcus infections by discharge month") +theme_bw()
summary(entero.month.fit)
Call:
glm(formula = entero ~ factor(DISCHARGE_MONTH), family = binomial,
data = entero_binary_month)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -8.88586 0.13868 -64.072 <2e-16 ***
factor(DISCHARGE_MONTH)2 -0.16754 0.20887 -0.802 0.4225
factor(DISCHARGE_MONTH)3 -0.04716 0.20485 -0.230 0.8179
factor(DISCHARGE_MONTH)4 -0.07022 0.22057 -0.318 0.7502
factor(DISCHARGE_MONTH)5 0.19973 0.19709 1.013 0.3109
factor(DISCHARGE_MONTH)6 -0.11961 0.21184 -0.565 0.5723
factor(DISCHARGE_MONTH)7 0.15738 0.19260 0.817 0.4139
factor(DISCHARGE_MONTH)8 -0.41638 0.22692 -1.835 0.0665 .
factor(DISCHARGE_MONTH)9 0.01575 0.20128 0.078 0.9376
factor(DISCHARGE_MONTH)10 -0.02964 0.20017 -0.148 0.8823
factor(DISCHARGE_MONTH)11 0.02457 0.20242 0.121 0.9034
factor(DISCHARGE_MONTH)12 0.13403 0.19520 0.687 0.4923
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 10733 on 3980819 degrees of freedom
Residual deviance: 10721 on 3980808 degrees of freedom
AIC: 10745
Number of Fisher Scoring iterations: 12
Furthermore, there was a lack of association between COVID-19 diagnosis and C. difficile or Enterococcus co-infection.
# Create the plot with labeled factorsggplot(alldiag_cdiff_factor, aes(x = Cdiff, fill = COVID)) +geom_bar(position ="fill") +scale_fill_manual(values =c("Negative"="#aaaaaa", "Positive"="#b20000")) +theme_bw() +labs(x ="C. difficile infection", y ="Proportion", fill ="COVID Status")
# Fit the linear modelcovid.cdiff.fit <-glm(Cdiff ~ COVID, data = alldiag_cdiff_binary, family = binomial)summary(covid.cdiff.fit)
Call:
glm(formula = Cdiff ~ COVID, family = binomial, data = alldiag_cdiff_binary)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.89046 0.03270 -149.548 <2e-16 ***
COVID 0.03812 0.14568 0.262 0.794
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 11690 on 132693 degrees of freedom
Residual deviance: 11690 on 132692 degrees of freedom
AIC: 11694
Number of Fisher Scoring iterations: 7
# Create the plot with labeled factorsggplot(alldiag_entero_factor, aes(x = Entero, fill = COVID)) +geom_bar(position ="fill") +scale_fill_manual(values =c("Negative"="#aaaaaa", "Positive"="#b20000")) +theme_bw() +labs(x ="Enterococcus infection", y ="Proportion", fill ="COVID Status")
summary(covid.entero.fit)
Call:
glm(formula = Entero ~ COVID, family = binomial, data = alldiag_entero_binary)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -6.21056 0.06293 -98.686 <2e-16 ***
COVID 0.21349 0.25810 0.827 0.408
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 3873.6 on 132693 degrees of freedom
Residual deviance: 3873.0 on 132692 degrees of freedom
AIC: 3877
Number of Fisher Scoring iterations: 9
5 Conclusion
This the conclusion. The Section 4 can be invoked here.